Topical Clustering of MRD Senses Based on Information Retrieval Techniques

نویسندگان

  • Jen Nan Chen
  • Jason S. Chang
چکیده

This paper describes a heuristic approach capable of automatically clustering senses in a machinereadable dictionary (MRD). Including these clusters in the MRD-based lexical database offers several positive benefits for word sense disambiguation (WSD). First, the clusters can be used as a coarser sense division, so unnecessarily fine sense distinction can be avoided. The clustered entries in the MRD can also be used as materials for supervised training to develop a WSD system. Furthermore, if the algorithm is run on several MRDs, the clusters also provide a means of linking different senses across multiple MRDs to create an integrated lexical database. An implementation of the method for clustering definition sentences in the Longman Dictionary of Contemporary English (LDOCE) is described. To this end, the topical word lists and topical cross-references in the Longman Lexicon of Contemporary English (LLOCE) are used. Nearly half of the senses in the LDOCE can be linked precisely to a relevant LLOCE topic using a simple heuristic. With the definitions of senses linked to the same topic viewed as a document, topical clustering of the MRD senses bears a striking resemblance to retrieval of relevant documents for a given query in information retrieval (IR) research. Relatively well-established IR techniques of weighting terms and ranking document relevancy are applied to find the topical clusters that are most relevant to the definition of each MRD sense. Finally, we describe an implemented version of the algorithms for the LDOCE and the LLOCE and assess the performance of the proposed approach in a series of experiments and evaluations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ارتقای کیفیت دسته‌بندی متون با استفاده از کمیته‌ دسته‌بند دو سطحی

Nowadays, the automated text classification has witnessed special importance due to the increasing availability of documents in digital form and ensuing need to organize them. Although this problem is in the Information Retrieval (IR) field, the dominant approach is based on machine learning techniques. Approaches based on classifier committees have shown a better performance than the others. I...

متن کامل

روش‎های انطباق و اصلاح اختلال پردازش حس‎های نزدیک درکودکان

Background: Abilities of sensory processes are underlying of effectiveness responding to situation, facilitate to learning, social behavior, and daily function. So, sensory processing disorders might have influence on daily life. The aim of this study was to present some of modification methods and environment accommodation and adapting with children characteristics with sensory processing. Met...

متن کامل

Word Sense Ambiguation: Clustering Related Senses

This paper describes a heuristic approach to automatically identifying which senses of a machinereadable dictionary (MRD) headword are semantically related versus those which correspond to fundamentally different senses of the word. The inclusion of this information in a lexical database profoundly alters the nature of sense disambiguation: the appropriate "sense" of a polysemous word may now c...

متن کامل

Sense Clusters For Information Retrieval: Evidence From Semcor And The EuroWordNet InterLingual Index

We examine three different types of sense clustering criteria with an Information Retrieval application in mind: methods based on the wordnet structure (such as generMization, cousins, sisters...); eooccurrence of senses obtained from Serecot; and equivalent translations of senses in other languages via the EuroWordNet InterLingual Index (ILI). We conclude that a) different NLP applications dem...

متن کامل

Sense Proximity versus Sense Relations

It has been widely assumed that sense distinctions in WordNet are often too fine-grained for applications such as Machine Translation, Information Retrieval, Text Classification, Document clustering, Question Answering, etc. This has led to a number of studies in sense clustering, i.e., collapsing sense distinctions in WordNet that can be ignored for most practical applications [1,5,6]. At the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Computational Linguistics

دوره 24  شماره 

صفحات  -

تاریخ انتشار 1998